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SNP enrichment for genomic annotations 
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Median distance to TES 
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Median distance to TSS 



Influence of different genomic parameters on delta-overlap. For each of the 1 ,000 
simulated SNP sets tagging variable percent of DHS variants, we assessed if SNP 
structure (measured by the number of LD SNPs) or genomic features, such as 
proximity to TSS and TES, could influence the estimates of delta-overlap. For each SNP 
set, we plot the median characteristics and the set and the delta overlap. While we 
observed that the proportion of causal variants within DHS regions was proportional 
with the delta-overlap, for a fixed proportion of causal DHS variants delta-overlap is 
stable for all genomic features (r 2 <0.1). We observed weak correlation with the 
number of LD SNPs in SNP sets that were highly saturated with DHS-tagging loci (> 
60%). 
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Supplementary Figure 2 
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Schematic figure of the strategy to simulate the GWAS SNP sets. From 1 000 
Genomes common European variants, we pick a functional SNP that maps within a 
pre-defined genomic annotation, e.g. intron. To imitate a GWAS approach we then 
identify the SNP on the genotyping array (lllumina Human Omni2.5 chip) that best 
tags (r 2 >0.5) the selected predefined functional variant. We construct sets of 1 ,41 6 
SNPs tagging predefined functional variants. These sets are then subject to 
enrichment tests. 
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Supplementary Figure 3 
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Quantification of measured overlap for lllumina Omni2.5 SNPs tagging 
functional variants from different annotations. About 30% of SNPs at lllumina 
Omni2.5 overlap with DHS sites on their own. This percentage quickly increases as 
additional linked SNPs are included. At r 2 =0.8 we should observe the strongest 
enrichment for promoter and 5'UTR SNPs, moderate signals for coding (non- 
synonymous) and 3'UTR, and no enrichment for intron or intergenic regions. The 
decrease in r 2 threshold increases the percentage of SNPs overlapping with DHS. 
Importantly, when deriving the null through matching-based tests, not accounting for 
the number of LD SNP dramatically affects the enrichment results. 
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Comparison of the effect sizes (delta-overlap) and power of different enrichment 
methods. Not matching on the number of SNPs in LD results in global inflation in 
statistics. For example, we observed p<0.001 significance across all regulatory and 
non-regulatory SNP sets. However, matching on LD alone is insufficient; we observed 
consistently inflated type I error across SNP sets tagging functional variants from 
introns (p<0.05 in 81 % of 1 ,000 SNP sets). We also observed that the standard 
matching parameters were inadequate across SNP sets that tagged variants in 3'UTR 
and exons. Accounting for the distance to the end of transcription (TES) was crucial if 
functional variants were selected from the 3'UTR, and it decreased the false positive 
rate by 26% (at p<0.05). Note that including MAF in the matching parameters has no 
effect, even though it is a frequently used parameter in SNP matching-based tests. 
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Supplementary Figure 5 
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Comparison of the significance of DHS enrichment in NHGRI GWAS Catalog using 
different matching parameters. Inclusion of different matching parameters, in 
particular, the number of LD-linked SNPs, has critical impact on the null distribution. 
The most frequently used matching parameters (MAF, TSS, GEN) are insufficient to 
adequately control for genomic confounders and greatly skew the null towards highly 
significant results. The null was defined with 1 0,000 matched SNP sets or shifting 
iterations. We observed that the GWAS catalog SNPs were significantly more enriched 
for DHS compared to the matched SNP sets (85% vs. null mean of 69.9%; maximum 
permuted enrichment was 74.1 5%, at p < 0.05). The number of LD SNPs as the only 
matching parameter significantly shifted the null distribution towards the observed 
(null mean of 76.27%; maximum permuted enrichment of 79.52% at p < 0.05). 
However, when we performed local shifts we observed that the null distribution 
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displaced toward the observed overlap even further (null mean = 81 .28% and 
maximum permuted enrichment 83.83%). 
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Supplementary Table 1 . Published statistical strategies within the literature to 
assess enrichment for different annotations. 



Trait 


Annotation 


Enrichment method 


Reference 


NHGRI GWAS Catalog 


TFs ChlP-seq, DHS 


Matching on MAF, TSS, 
platform, UCSC gene 
predicted function 


[1] 


144 diseases 


NF-kBChlP-seq 


Matching on MAF, TSS 


[2] 


Platelet and 


FAIRE-seq 


Matching on MAF, TSS 


[3] 


erythrocyte 
phenotypes 




and LD 




Breast cancer 


TF and histone 
modification ChlP-seq 


LD 


[4] 


Primary biliary cirrhosis 


DHS, FAIRE-seq 


Permutations with 
SNPs within associated 
loci 


[5] 


A j_ 1 

Asthma 


/ — 1 i • 1 1 ft /I ft /I j_ j_ 

Chromatin HMM states, 
H3K4me1, H3K27ac 


None 


[6] 


NHGRI GWAS Catalog 


DHS 


MAF, TSS, GENIC 


[7] 


NHGRI GWAS Catalog 


Intronic splicing enhancers 


MAF, TSS 


[8] 


NHGRI GWAS Catalog- 


CNVs, TFBS, splice 


Relative to disease 
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trans-eQTL SNPs 


I / * I 

enhancers/silencers, 
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Matching on MAF, TSS, 
genomic location 


[10] 


Migraine 


DHS 


MAF, TSS, GC content 


[11] 


Lipid levels 


Chromatin HMM states, 
histone modifications, 
open chromatin 


MAF, TSS, LD partners 


[12] 


eQTLs 


Histone modifications 


MAF, TSS 


[13] 


Colorectal cancer SNPs 


Enhancers 


LD, llluminOmniExpress 
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Immune-related 


Enhancers 


LD, llluminOmniExpress 
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disease SNPs 








Specific traits from 


Human pancreatic islet 


LD; TSS, MAF 


[16] 


GWAS Catalog; T2D and 


TFBS and enhancer clusters 






fasting glycemia 








GWAS Catalog traits 


Chromatin states 


Shifts and 

permutations within 
GWAS Catalog traits 


[17] 


Rare variants and 


GENECODE 


Annotation 


[18] 


structural variations 


annotation/functional 


shifting/MAF+TSS 




(SVs)/SNPs/eQTLs 


annotations 


matching 
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Supplementary Table 2. Proportion of 1,000 SNP sets, derived from causal 
variants with specific functional annotations, demonstrating enrichment at 



p<0.05. 



Enrichment Promoter 5'UTR 


Non- 


3'UTR 


Intron 


Intergenic 


method 


synonymous 








GEN+TSS+TES+LD 1 1 


0.002 


0.102 


0.013 


0 


GEN+MAF+TSS+LD 1 1 


0.001 


0.358 


0.016 


0 


GEN+MAF+TSS 1 1 


1 


1 


1 


1 


GEN+TSS+LD 1 1 


0.001 


0.342 


0.017 


0 


MAF+TSS+LD 1 1 


0.114 


0.959 


0.708 


0 


MAF+TSS 1 1 


1 


1 


1 


1 


LD 1 1 


0.875 


0.994 


0.808 


0 


Local shifts 1 1 


0.604 


0.112 


0.044 


0.074 
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